OcrV1, Main, Exploration, bibRecord, 001927

Classifier Adaptation with Non-representative Training Data

Identifieur interne : 001927 ( Main/Exploration ); précédent : 001926; suivant : 001928

Classifier Adaptation with Non-representative Training Data

Auteurs : Sriharsha Veeramachaneni [États-Unis] ; George Nagy (informaticien) [États-Unis]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2002.

RBID : ISTEX:BA6AC24A377F2F9A6379DAC3467543B5C8B7A845

Abstract

Abstract: We propose an adaptive methodology to tune the decision boundaries of a classifier trained on non-representative data to the statistics of the test data to improve accuracy. Specifically, for machine printed and handprinted digit recognition we demonstrate that adapting the class means alone can provide considerable gains in recognition. On machineprinted digits we adapt to the typeface, on hand-print to the writer. We recognize the digits with a Gaussian quadratic classifier when the style of the test set is represented by a subset of the training set, and also when it is not represented in the training set. We compare unsupervised adaptation and style-constrained classification on isogenous test sets of five machine-printed and two hand-printed NIST data sets. Both estimating mean and imposing style constraints reduce the error-rate in almost every case, and neither ever results in signi.cant loss. They are comparable under the first scenario (specialization), but adaptation is better under the second (new style). Adaptation is bene.cial when the test is large enough (even if only ten samples of each class by one writer in a 100- dimensional feature space), but style conscious classification is the only option with fields of only two or three digits.

Url:

https://api.istex.fr/document/BA6AC24A377F2F9A6379DAC3467543B5C8B7A845/fulltext/pdf

DOI: 10.1007/3-540-45869-7_17

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000B14
to stream Istex, to step Curation: 000B01
to stream Istex, to step Checkpoint: 001041
to stream Main, to step Merge: 001A07
to stream Main, to step Curation: 001927

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Classifier Adaptation with Non-representative Training Data</title>
<author><name sortKey="Veeramachaneni, Sriharsha" sort="Veeramachaneni, Sriharsha" uniqKey="Veeramachaneni S" first="Sriharsha" last="Veeramachaneni">Sriharsha Veeramachaneni</name>
</author>
<author><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation><country>États-Unis</country>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:BA6AC24A377F2F9A6379DAC3467543B5C8B7A845</idno>
<date when="2002" year="2002">2002</date>
<idno type="doi">10.1007/3-540-45869-7_17</idno>
<idno type="url">https://api.istex.fr/document/BA6AC24A377F2F9A6379DAC3467543B5C8B7A845/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000B14</idno>
<idno type="wicri:Area/Istex/Curation">000B01</idno>
<idno type="wicri:Area/Istex/Checkpoint">001041</idno>
<idno type="wicri:doubleKey">0302-9743:2002:Veeramachaneni S:classifier:adaptation:with</idno>
<idno type="wicri:Area/Main/Merge">001A07</idno>
<idno type="wicri:Area/Main/Curation">001927</idno>
<idno type="wicri:Area/Main/Exploration">001927</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Classifier Adaptation with Non-representative Training Data</title>
<author><name sortKey="Veeramachaneni, Sriharsha" sort="Veeramachaneni, Sriharsha" uniqKey="Veeramachaneni S" first="Sriharsha" last="Veeramachaneni">Sriharsha Veeramachaneni</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Rensselaer Polytechnic Institute, 12180, Troy, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">États-Unis</country>
</affiliation>
</author>
<author><name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<affiliation wicri:level="2"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Rensselaer Polytechnic Institute, 12180, Troy, NY</wicri:regionArea>
<placeName><region type="state">État de New York</region>
</placeName>
<placeName><settlement type="city">Troy (New York</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="lab" n="5">Institut polytechnique Rensselaer</orgName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2002</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">BA6AC24A377F2F9A6379DAC3467543B5C8B7A845</idno>
<idno type="DOI">10.1007/3-540-45869-7_17</idno>
<idno type="ChapterID">17</idno>
<idno type="ChapterID">Chap17</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: We propose an adaptive methodology to tune the decision boundaries of a classifier trained on non-representative data to the statistics of the test data to improve accuracy. Specifically, for machine printed and handprinted digit recognition we demonstrate that adapting the class means alone can provide considerable gains in recognition. On machineprinted digits we adapt to the typeface, on hand-print to the writer. We recognize the digits with a Gaussian quadratic classifier when the style of the test set is represented by a subset of the training set, and also when it is not represented in the training set. We compare unsupervised adaptation and style-constrained classification on isogenous test sets of five machine-printed and two hand-printed NIST data sets. Both estimating mean and imposing style constraints reduce the error-rate in almost every case, and neither ever results in signi.cant loss. They are comparable under the first scenario (specialization), but adaptation is better under the second (new style). Adaptation is bene.cial when the test is large enough (even if only ten samples of each class by one writer in a 100- dimensional feature space), but style conscious classification is the only option with fields of only two or three digits.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
<settlement><li>Troy (New York</li>
</settlement>
<orgName><li>Institut polytechnique Rensselaer</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="État de New York"><name sortKey="Veeramachaneni, Sriharsha" sort="Veeramachaneni, Sriharsha" uniqKey="Veeramachaneni S" first="Sriharsha" last="Veeramachaneni">Sriharsha Veeramachaneni</name>
</region>
<name sortKey="Nagy, George" sort="Nagy, George" uniqKey="Nagy G" first="George" last="Nagy">George Nagy (informaticien)</name>
<name sortKey="Veeramachaneni, Sriharsha" sort="Veeramachaneni, Sriharsha" uniqKey="Veeramachaneni S" first="Sriharsha" last="Veeramachaneni">Sriharsha Veeramachaneni</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001927 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001927 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:BA6AC24A377F2F9A6379DAC3467543B5C8B7A845
   |texte=   Classifier Adaptation with Non-representative Training Data
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Classifier Adaptation with Non-representative Training Data

Classifier Adaptation with Non-representative Training Data

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri